Counting statistics for genetic switches based on effective interaction approximation 
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Applicability of counting statistics for a system with an infinite number of states is investigated. 
The counting statistics has been studied a lot for a system with a finite number of states. While it is 
possible to use the scheme in order to count specific transitions in a system with an infinite number 
of states in principle, we have non-closed equations in general. A simple genetic switch can be 
described by a master equation with an infinite number of states, and we use the counting statistics 
in order to count the number of transitions from inactive to active states in the gene. To avoid to 
have the non-closed equations, an effective interaction approximation is employed. As a result, it is 
shown that the switching problem can be treated as a simple two-state model approximately, which 
immediately indicates that the switching obeys non-Poisson statistics. 



I. INTRODUCTION 

Counting statistics is a scheme to calculate all statis- 
tics related to specific transitions in a stochastic system. 
In the counting statistics, a master equation with discrete 
states is used to derive time-evolution equations for gen- 
erating functions related to the specific transitions. The 
scheme has been used to investigate Forster resonance 
energy transfer, and many successful results have been 
obtained QHH. Although the scheme is basically formu- 
lated for a system with a finite number of states, it is 
possible to use the scheme to investigate a system with 
an infinite number of states. However, as exemplified 
later, we have non-closed equations in general, so that it 
would be needed to develop approximation schemes suit- 
able for specific systems. As a first step, it is important 
to check whether an approximation scheme for the count- 
ing statistics is available for the system with an infinite 
number of states or not. 

In the present paper, we focus on dynamics in genetic 
switches. It has been shown that stochastic behavior 
plays an important role in gene regulatory systems > 
and there are many studies for the stochasticity in the 
gene regulatory systems from experimental points of view 
(e.g., see an d theoretical ones (e.g., see HH). Not 
only studies by numerical simulations, but also those by 
analytical calculations have been performed. Some ana- 
lytical expressions for the static properties, i.e., station- 
ary distributions for the number of proteins or mRNAs, 
have already been obtained. In addition, in order to in- 
vestigate the role of the stochasticity in genetic switches, 
dynamical properties, i.e., switching behavior between 
active and inactive gene states, have also been studied. 
Basically, such dynamical properties have been investi- 
gated by numerical simulations (e.g., see [l7|); only for a 
simple system, analytical expressions for the first-passage 
time distribution have been obtained [llj]. The genetic 
switch is described by a master equation with an infinite 
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number of states. Hence, if we can use the scheme of the 
counting statistics in order to investigate the dynamical 
properties in the genetic switches, it will be helpful to ob- 
tain deeper understanding and intuitive pictures for the 
genetic switches. 

The aim of the present paper is to seek the applicability 
of the counting statistics in order to investigate the dy- 
namical property in the genetic switches. It immediately 
becomes clear that a straightforward application of the 
counting statistics derives intractable non-closed equa- 
tions. In order to obtain simple closed forms, we here 
employ an effective interaction approximation 19]. As a 
result, we will show that the switching problem can be 
treated as a simple two-state model approximately. This 
result immediately gives us intuitive understanding for 
the switching behavior and the non-Poissonian property. 

The present paper is constructed as follows. In Sec.llll 
we give a brief explanation of a stochastic model for the 
genetic switch. In Sec. HID the counting statistics is em- 
ployed in order to count the number of transitions in the 
genetic switch, and, as a result, a simple two-state model 
is derived approximately. The derived approximated re- 
sults are compared with those of Monte Carlo simulations 
in Sec. IIV1 Section [V] gives concluding remarks. 



II. MODEL 

A gene regulatory system consists of many compo- 
nents, such as genes, RNAs, and proteins. Here, a simpli- 
fied model is used; mRNAs are neglected for simplicity, 
and an activated gene assumes to directly increase the 
number of proteins. In addition, in the simplified model, 
a repressed gene cannot produce any proteins. The above 
model has been used to investigate the switching behav- 
ior in previous works, and, for example, see 13] for details 
of the model. 

We summarize the model studied in the present paper 
in Fig. [TJ The binding interaction is assumed to be a re- 
pressed one, and the gene is activated only when the reg- 
ulatory proteins are not binding the gene. The proteins 
are produced from the gene in the active state with rate 
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degradation 



FIG. 1: A schematic illustration of the self-regulating gene 
with repressed binding interaction. When the regulatory pro- 
teins are combining the gene, the gene is repressed and there 
is no production of proteins. If the regulatory proteins are 
released from the gene, the gene becomes active and it can 
produce the proteins. We consider the transition between the 
active and inactive states as a switch. 



assumption in the present paper. 

Let a n and (3 n be states in which there are n free pro- 
teins for the active and inactive states, respectively. The 
probabilities for a n and /?„ at time t satisfy the following 
master equations; 



dP(a n ,t) 
Jt 



dP(P n ,t) 
dt 



=g[P(on-i,t) - P(a n ,t)] 

+ d[{n + l)P(a n+1 , t) - nP(a n ,t)} 
-hnP{a n ,t) + fP(p n ,t), (1) 



--d[(n + l)P(/3 n+1 ,t)-nP(/3 n ,t)] 
+ hnP(a n ,t)-fP(/3 n ,t), 



(2) 



g, and proteins are degraded spontaneously with rate d. 
The regulatory proteins bind the gene with a rate func- 
tion %(n), where n is the number of free proteins. For 
example, H(n) = hn for a monomer interaction case, and 
T-L(n) = hn{n — l)/2 for a dimer interaction case, where 
h is a rate constant for the binding. / is a rate constant 
with which the regulatory proteins are released from the 
repressor site of the gene. 

We here give short comments for the model from the 
viewpoint of experiments. Using this simplified model, 
we can discuss the connection among the model parame- 
ters, the number of proteins, and the switching behaviors. 
While the number of proteins n can be observed or esti- 
mated experimentally, as far as we know, there has not 
been an experimental technique to observe the attach- 
ment and detachment of the regulatory proteins directly. 
We hope that developments of single-molecule observa- 
tions in future would enable us to give information about 
the switching dynamics. 

III. COUNTING STATISTICS FOR THE 
NUMBER OF TRANSITIONS 

A. Master equation for the number of proteins 

Analytical treatments for the self-regulating gene sys- 
tem have been developed, and an exact solution is known 
for the monomer interaction case, i.e., H(n) = hn 
[IllQil]- In order to simplify the analytical treatments, 
an additional assumption has been used in some previous 
works [H, [l{J; i.e., some of proteins are assumed to be 
inert when the gene state is active. The inert proteins 
cannot repress the gene, and it is not degraded. For the 
monomer interaction case, there is only one inert protein; 
the number of inert protein for the dimer interaction case 
is two, and so on. Note that the assumption of the in- 
ert proteins does not have physical meanings; this only 
simplify the analytical treatments (for details, see [H|). 
However, it has been shown that this assumption has lit- 
tle influence of the gene system, and then we employ the 



where P{a n ,t) and P(/3 n ,t) are probabilities for n free 
proteins for the active and inactive states, respectively. 

As stated in Sec. HI the exact solutions for stationary 
distributions of the number of proteins have been derived, 
and those are expressed using the Kummer confluent hy- 
pergeometric functions. For details, see [ill [l3j. 



B. Counting statistics 

Using the concept of the counting statistics [3-Q, it 
is possible to investigate dynamical properties, i.e., all 
statistics for the switching behavior between the active 
and inactive states. In the present paper, as an example, 
we calculate the number of transitions from the inactive 
state to the active state. The generating functions for 
the transitions are immediately obtained from the mas- 
ter equations ([1]) and ([2]). A brief explanation of the 
counting statistics is given in the Appendix, and we here 
give consequences of the counting statistics. 

A probability, with which there are k transitions from 
the inactive state to the active state during time t, is 
denoted by P(k\t). The generating function for P(k\i) is 
defined as 



F(A,t) = ^P(fc|t)A fc , (3) 

fc=0 

where A is a counting variable. The generating func- 
tion gives all information related to "inactive — > ac- 
tive" transitions. According to the scheme of counting 
statistics, we split F(X, t) into restricted generating func- 
tions {(f>.(a n , A, t)} and {4>(/3 n , A, t)}, where <f>(a n ,\,t) 
and <f>{p n ,\,t) are the generating functions for the sys- 
tem in states a n and /3 n at time t, respectively. Using the 
scheme of the counting statistics, we obtain the follow- 
ing time-evolution equations for the restricted generating 
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functions {<fi(a n , X,t)} and {<j)(/3 n , A, t)}: 
d<f>(a n ,X,t) 



dt 



dcj>{p n , X,t) 
dt 



= ff[^(on-i) A, t) - 0(a„, A, i)] 

+ d[(n + l)<p(a n+ i, A, i) - n<f>(a n , A, i)] 
-hn(()(a n ,\,t) + \f(l>(i3 n ,\,t), (4) 

=d[(n+ l)0(j8 n+ i,A,*) - ruj>{p n , A, t)] 
+ hn<j)(ct n ,\,t) - f<f>(p n ,\,t). (5) 



Although Eqs. (gj) and © are similar to Eqs. (JTJ) and ©, 
note that the final term in the right hand side of Eq. (gj 
has a factor A. The factor A is introduced in order to 
count the number of transitions, and we can count the 
number of transitions related to this term (for details, 
see Appendix). Using the above restricted generating 
functions, the generating function F(X, t) is calculated 
as 



F(X, t) = J2 W( a «' A ' *) + A ' *)} • 



(6) 



n=Q 



Next, we introduce the following generating functions 
for 4>(a n ,X,t) and </>(/?„, A, i): 



a(X, z,t) = y^ j <p(a n ,X, t)z n , 

n=0 
oo 

/3(A,z,t) = ^ </>(/?„, A, ^z". 



(7) 
(8) 



71=0 



It is straightforward to derive the time-evolution equa- 
tions for the new generating functions a(X,z,t) and 
/3(X,z,t) from Eqs. (g) and ©; 



da(X, z, t) 
dt 



d/3(X,z,t) 
dt 



={z-l) 



ga{X, z,t) — d 



da{X, z, t) 



dz 



hz 9a ^ Z,t) +Xm\z,t), (9) 



dz 

(z - l)d 



d/3(X,z,t) 
dz 



da(X,z,t) 

hz f/3(X,z,t). 

oz 



(10) 



Using the generating function a(X, z, t) and /3(A, z, t), the 
generating function F(X,t) is given by 

F(X, t) = a(X, z = l,t) + z = l, t), (11) 

and therefore it is enough to solve the following time- 
evolution equations in order to calculate the generating 
function ^(A, t): 



da(X,t) da(X,z,t) 
dt dz 



dfi(X, t) da(X, z, t) 



+ Xf0(X,t), (12) 



z = l 



dt 



dz 



fP(X,t), 



(13) 



where we define a(X,t) = a(X,z — l,t) and f3(X,t) = 
P(X,z = l,t). 

Note that Eqs. (fT2|) and ([T3l) contain the derivative of 
a(A, z, t) with respect to z. Hence, the equations are not 
closed. If these terms are expressed simply using a(A, t), 
we will have simultaneous differential equations written 
only by the generating functions a(X,t) and /3(A, i); i.e., 
we have closed equations and hence the obtained equa- 
tions may be solved analytically. In the following anal- 
ysis, an effective interaction approximation is employed, 
and we will show that the above statistics can be approx- 
imated by a simple two-state model. 



C. Approximation for the interaction 

In the effective interaction approximation, the interac- 
tion function TL(n) is replaced as a constant value. As 
shown in [TJj], the dependence of H(n) on n makes it dif- 
ficult to obtain analytical results, and it has been shown 
that the approximation gives qualitatively good results. 

Replacing the interaction function T~L(n) as 



H(n) = h, 



(14) 



where h is a constant, we obtain the following equations 
instead of Eqs. ([12]) and ([13]): 



(15) 
(16) 



da(X, t) 

dt 
d(3(X,t) 

dt 



---ha(X,t)+Xf/3(X,t), 
--ha(X,t)-fP(X,t). 



Note that Eqs. (|T5|) and flTBl are written only by a(X,t) 
and /3(X,t). It means that the switching problem can be 
approximated as a simple two-state model if the effective 
interaction h is chosen adequately. 

We here briefly explain the choice of the effective in- 
teraction h using a simple example, i.e., the monomer 
binding interaction case. For the monomer binding in- 
teraction, the interaction function is calculated as follows 
[l9j . In this case, the interaction function is hn. In or- 
der to obtain the effective interaction h, the number of 
proteins n is replaced as the average number of proteins, 
i.e., 



h = h(n) a , 



(17) 



where (n) a is the expectation of the number of free regu- 
latory proteins under a condition that the gene is in the 
active state (conditional expectation). 

The conditional expectation can be calculated from the 
stationary distribution of the number of proteins. Note 
that the generating functions a{X,z,t) and fi(X,z,i) are 
reduced to generating functions for the stationary distri- 
bution of the number of proteins when A = 1. Hence, as 
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shown in [19], they are written as follows. 

a{z) ee lim a(X = 1, z, t) = AF[a, b, N(z - 1)], (18) 

t—>CO 

f3(z) ee lim /3(A = l,z,t) 

t-^-OC 



1 + - 1 AF[a - 1, b - l,N(z - 1)] - a(z) 



(19) 



where A = f / (/ + ft,) and 



N = -, o = l + -, 6 = H — . 

ad a 

g, r) is the Kummer confluent hypergeometric func- 
tion, 



F{p,q,r) = ^2 



(p)» r n 

n=0 (?)» " ! ' 



(20) 



where (p)„ = p(p+l)(p+2) • • • (p+rc.— 1). We, therefore, 
obtain 



1 d 



a(l) dz 



a{z) 



g(d + f) 
d(d + f + h)' 



(21) 



By inserting Eq. (|2T|) into Eq. ([T7|) . the following self- 
consistent equation is derived: 



ft, = ft- 



g(d + f) 



d(d + f + ft) 
Solving Eq. ([22]) . we obtain 



h = 



-(d 2 + fd) + y/{d 2 + /d) 2 + 3hgd{d + /) 
2d 



(22) 



(23) 



We finally comment on a solution of the simple two- 
state model (Eqs. (fT5|) and (fTgl) ). The simple two-state 
model can be solved exactly [ll, Q, and the probability 
distribution P{k\t) for the number of "inactive — > active" 
transitions during time t is explicitly written as follows: 



P(k\t) = 



(i-7 2 )r 

27 J k\y/8jT/n 

x {2 7 (fc + T)4_ 1/2 ( 7 T) + (1 + 7 2 )r4 + i/ 2 (7r)} 

(24) 



where T = (/ + ft>/2, 7 2 = 1 - 4//3(l)/(/ + ft), and 
7 n (z) are modified Bessel functions of the first kind. This 
expression (I24[) immediately gives us the non-Poissonian 
picture of the phenomenon. 
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FIG. 2: Probability distributions for the number of "inactive 
— > active" transitions, (a) Monomer binding interaction case, 
(b) Dimer binding interaction case. In each figure, filled cir- 
cles and filled boxes are Monte Carlo results for time t = 10 
and t = 100, respectively. Solid and dashed lines corresponds 
to approximated analytical results of Eq. (I24|l for time t = 10 
and t = 100, respectively. 



analytical results with those of Monte Carlo simula- 
tions. The original genetic switch explained in Sec. [Ill 
was simulated using a standard Gillespie algorithm [20j . 
The parameters used in the simulation are as follows: 
d = 1, g = 50.0, ft = 0.004, / = 0.1. Note that 
these parameters were selected as one of the typical val- 
ues used in the previous works [p| [p| . 

Firstly, we consider the monomer binding interaction 
case. According to the discussions in Sec. 3.3, the value of 
the effective interaction ft, is calculated as ft = 0.173. Fig- 
ure HJa) shows the results of the analytical calculations 
(Eq. ([24)0 and those of the Monte Carlo simulations. 
Although there are quantitative differences, the results 
shows that the approximated two-state model captures 
the essential features of the phenomenon. 

Next, we consider a dimer binding interaction case, i.e., 
TL(n) = hn(n — l)/2. In this case, the effective interaction 
is calculated as follows: 



IV. NUMERICAL RESULTS 

In order to check the validity of the analytical treat- 
ments and the approximations, we here compare the 



ft = ft 



(n(n - l)) a 



(25) 



As shown in [19(, the effective interaction ft is obtained 
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by solving the following self-consistent equation: 
r hi d 2 . . 

h= 2W)^ a{Z) ^ (26) 

We here numerically solved the self-consistent equation 
(Eq. ([2H])), and the calculated value of the effective inter- 
action is h = 1.358. Using the calculated value, we depict 
the analytical results and the corresponding Monte Carlo 
results in Fig. [2jb). From the comparison, we confirmed 
that the approximated two-state model is available even 
in the dimer binding interaction case. Although results 
are not shown, we performed numerical simulations for 
other some parameters, and checked the validity of the 
analytical treatments. For example, even for parameter 
regions in which the probability distribution of the num- 
ber of proteins has bistability, the approximation scheme 
works well. 



systems [22J. We expect that the simple description 
developed in the present paper is available for various 
cases, such as complicated regulatory systems and time- 
dependent systems, and that the description gives new 
insights for the regulation mechanisms and stochastic be- 
haviors. 
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Appendix A: Generating function for counting 
statistics 



V. CONCLUSIONS 

In the present paper, we studied an analytical scheme 
to extract information related to the dynamical behavior 
in genetic switches. Using an effective interaction ap- 
proximation, a simple two-state model is obtained, and 
we confirmed that the two-state model captures the fea- 
tures of the phenomenon. Note that in the analytical 
treatments, we did not neglect the stochastic properties 
of the system (except for the effective interaction approxi- 
mation); i.e., we can calculate all statistics for transitions 
approximately, including higher order moments. It could 
be possible to apply the above effective expression for the 
transitions between the active and inactive states to more 
complicated gene regulatory networks without loss of the 
stochasticity; this would give us deeper understanding 
for the switching behavior of the gene regulatory sys- 
tems including static, dynamical, and stochastic behav- 
iors. In addition, the idea of the effective interaction may 
be similar to the mean-field approximation in statistical 
physics; the interaction is replaced with the average. It 
may be possible to develop higher-order approximations 
using the analogy with the conventional approximation 
schemes in statistical physics; this is an important future 
work. 

We discussed properties only in the stationary states, 
because the effective interaction approximation has been 
applied only for the stationary states at the moment; the 
average number of proteins (or higher moments) should 
be estimated adequately, and it was calculated by using 
the analytical solutions for the stationary distributions of 
the number of proteins. Recently, exact time-dependent 
solutions for a self-regulating gene have been derived [2l[ . 
Hence, it may be possible to extend the effective interac- 
tion approximation to non-stationary states. If so, the 
effective interaction h would be time-dependent, and, 
at least numerically, it is possible to calculate various 
moments for the counting statistics for time-dependent 



Here, we give a brief explanation for the counting 
statistics for readers' convenience (For details, see 043-) 
In the framework of counting statistics, the quantity of 
interest is the number of target transitions. It is needed 
to set multiple target transitions in the genetic switches, 
and the genetic switches have two states, i.e., active and 
inactive states. In the following explanations, a simple 
setting, in which there is only one transition matrix and 
only one target transition, will be discussed because it is 
straightforward to apply the following simple discussions 
to the genetic switches. 

Let {K nm } be a transition matrix. We here derive the 
generating function for counting the number of events of 
a specific target transition ia — ► Ja- Denote the prob- 
ability, with which the system starts from state m and 
finishes in state n with k transitions from ia to ja during 
time t, as P nm (k\t). In order to calculate the probability 
Pnm(k\t), we here define a probability G' kl (t) with which 
the system evolves from state I to state k, provided no 
*a — > JA transitions occur during time t. By using the 
probability G' kl (t), the probability P nm (k\t) is calculated 
as 

Pnm{k\t^ 

G' njA {t) * K jAiA (t)G' iAjA (t) * • • • * K jAiA (t)G' iAjA (t) 

V v ' 

fc-1 

* K jAiA (t)G' iAm (t), (Al) 

where g\(t) * g 2 (t) = f Q gi(t — t')g 2 (t')dt' denotes the 
convolution. This formulation means that an occurrence 
of the target transition za — > Ja is sandwiched in between 
situations with no occurrence of the target transition, 
and it is repeated k times. 

Next, we construct the generating function (j) nm {Xit) 
of the probability P nm (k\t): 

oo 

km(X,t) = ^Pnm{k\t). (A2) 
fc=0 
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That is, the generating function n m(A, t) gives the 
statistics of the number of transition — > j'a during 
time t under the condition that the system starts from 
state m and ends in state n. The generating function 
<j)nm{\~t) satisfies the following integral equation 

0nm(A,t) 

- G'nmit) + [ G' 1llA (t - t')XK ]AlA (t% Am (X,t')dt', 
Jo 

(A3) 

and obeys the following time-evolution equation 

d ~ . . 
-r;(Pn m {\t) 

^n,j A Kj A i A (t)G iAm (t) 

i 

+ XG' njA {Q)K jAiA {t)4> iAm {t) 

+ J* (jG'njSt 0) \K JAlA {t')4> lAm {t')dt> 

= }^K n i(t)4>i m (X,t) - 5„,j A (l - X)Kj A i A (t)4>i Am (X,t), 

(A4) 

where <p nm (A, 0) = 5 n . m . In order to show (| A4|l . we used 
the following two facts. Firstly, the probability of no 
target transitions, G' nm (t), obeys 

T^mW = ^ Kni(t)G' im (t) - 5 n ,j A K ]A i A {t)G' iAm {t), 

(A5) 



where G' nm (0) = S n , m - Secondly, the derivative of the 
convolution is given by 

j t j\i{t-t')g 2 (t')dt' 

= 9i(P)92{t) + £ {j t 9i{t - t')\ g 2 (f)dt'. (A6) 

Using the generating function ^ n m(A,t), we construct 
restricted generating functions {(f> n (X,t)} as follows: 

<f> n (A, t) = ^m(A, t)p m (0), ( A7) 

m 

where p m (0) is a probability distribution at initial time 
t = 0. From (|A4[) and (|A7|) . the restricted generating 
function satisfies 

d . 

— (pn{X,t) 

= J2 K m(t)MKt) - S n , jA (l - X)K jAiA (t)& A (X,t), 

(A8) 

and these equations should be solved with initial condi- 
tions <p n (X, 0) = J2 m 0nm(A, 0)p m (0) = _p„(0). The sum- 
mation of {0„(A, i)} for n gives the objective generating 
function for counting the number of events of the specific 
target transition. 
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